Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix double utf8 encoding for payloads #81

Merged

Conversation

darrachequesne
Copy link
Member

Following #80

Currently, a payload containing only strings (no binary) transmitted over polling transport gets utf8 encoded / decoded twice, one by the transport (the content type is set to text/plain; charset=UTF-8) and one here (encodePacket is called with utf8encode to true).

Fixes socketio/engine.io#315

@@ -255,13 +261,6 @@ module.exports = function(parser) {
});
});

it('should err on invalid utf8', function () {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That case (string payload non utf8-encoded) should not happen.

@darrachequesne darrachequesne mentioned this pull request Dec 17, 2016
@darrachequesne darrachequesne merged commit 181acef into socketio:master Dec 21, 2016
@darrachequesne darrachequesne deleted the fix/double-utf8-encoding branch December 21, 2016 08:15
@darrachequesne
Copy link
Member Author

Breaking change!

@darrachequesne darrachequesne added this to the 2.0.0 milestone Dec 23, 2016
@Andarist
Copy link

Andarist commented Mar 2, 2017

@darrachequesne
even if that's a breaking change it breaks wrong behaviour to the correct one, couldnt it be released as patch in v1, or something like @next? would like to benefit from this, but making own fork of the repo requires me to fork all 3 - socket.io-client uses engine.io-client which uses engine.io-parser

Also im wondering about this fragment of the source code.

We are using on the server go's client instead of node's and as its more low-level programming language it tends to think more in terms of bytes than in characters. So basically when meeting a char like ą it would expect this header length to be longer by 1, unfortunately its calculated via .length instead of counting bytes, so it crashes on server side.

However im not sure how original node.js client works and what it expects as this header length - therefore where should this be fixed? here in the parser? or there in server's golang implementation?

For now I get it work using heavy monkey-patching but would like to get rid of this asap :trollface:

const originalEncode = parser.encodePacket
parser.encodePacket = (packet, supportsBinary, utf8encode, callback) => originalEncode(
	packet,
	supportsBinary,
	false,
	msg => callback(Object.defineProperty(
		{ toString: () => msg },
		'length',
		{ get: () => utf8.encode(msg).length }
	))
)

@darrachequesne
Copy link
Member Author

@Andarist actually I'm not sure if it should be backported, since different versions of server/client would not be able to understand each other, would they?

Yes, it seems the parser expect that the payload is already utf8 decoded, I'm not sure if that's correct. Related: #87

@Andarist
Copy link

@darrachequesne
do u know how many versions were bugged? how many would be incompatible?

also i aint sure how the protocol should be working exactly, especially in context of various transports combinations - but my monkey patch mentioned before was applied just to encoding function, decoding seems to work just fine, but this PR changes both

Note: i think that the frame length should be counted in future releases of protocol in bytes than utf8 chars, it simply makes more sense for more backends.

@getnamo
Copy link

getnamo commented Nov 7, 2017

What updates are needed to enable socket.io 2.0 support for the cpp client? socketio/socket.io-client-cpp#173

al42and added a commit to al42and/gps-web-tracker that referenced this pull request Apr 20, 2020
This solves a minor, but nasty UTF-8 encoding-deconding problem, solved
by socketio/engine.io-parser#81 and released
since socket.io v2.0.0. Maybe the newer version would work too, but I'm
too lazy to check right now.
@aleen42
Copy link

aleen42 commented Mar 17, 2021

I don't think that double utf8 encoding for payloads is a problem especially when browsers do not encode payloads as expected like socket servers do not respond with Content-Type: text/plain; charset=UTF-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Double encoding/decoding UTF8
4 participants